/********************************************************************************
Discrimination in Multi-Phase Systems: Evidence from Child Protection

Created on: 12/28/2022
Last Modified on: 2/17/2024

Description: This program generates the sample of investigations from 2017 to 2019
needed to extend the Gross and Baron (2022) dataset through 2019. 

Note that we have removed the file directory names from this program for 
confidentiality reasons. 
********************************************************************************/

**************************
**(0) SETUP
**************************
clear
set more off
macro drop all
capture log close

*Set directories 
global clean 
global cleandata 
global tmpdata 

**************************
**(1) GENERATE SAMPLE OF INVESTIGATIONS FROM 2017 TO 2019
**************************
use "${tmpdata}screened_out_calls_qje.dta", clear
append using "${tmpdata}screened_in_calls_qje.dta"
tab screened

**Drop small number of observations with invalid child ids 
drop if childpartyid==0 
drop if childpartyid==99

**Generate unique screener ids 
foreach x in scrnr_last_nm scrnr_first_nm {
	replace `x'=subinstr(`x'," ","",.)
	replace `x'=subinstr(`x',",","",.)
	replace `x'=subinstr(`x',".","",.)	
	replace `x'=subinstr(`x',"-","",.)	
	replace `x'=lower(`x')
}

egen screener = group(scrnr_first_nm scrnr_last_nm)
sum screener 

**Keep only variables of interest 
keep childpartyid intake_id complaint_date complaint_dttm screened whi bla mom dad rel notrel ///
child_age child_sex county screener scrnr_first_nm scrnr_last_nm phyab phyneg impsup threat sexab miss_alleg ///
medneg failprot maltreatment court mdhhs clergy birthmatch miss_reporter provider bcal edu law family medical other counselor zipcode_vic

**Generate primary outcome: subject of another investigation within six months 
*Make the date variable uniform
gen cw_date = complaint_date
replace cw_date = substr(complaint_date,1,10) if screened==0
drop complaint_date

*Gen a stata date variable 
gen cw_date_stata = date(cw_date,"YMD")
order cw_date_stata 

gen cw_date2 = substr(cw_date,1,10) 
order cw_date2

gen cw_date_stata2 = date(cw_date2,"YMD")
replace cw_date_stata = cw_date_stata2 if cw_date_stata==.

*For a given focal child X inv observation, did you have another *investigation* within 6 months?
foreach x in inv6m {
	gen `x'=.
}

order childpartyid cw_date screened inv*

gsort childpartyid cw_date_stata 
bysort childpartyid: replace inv6m = 1 if inrange(cw_date_stata[_n+1], cw_date_stata+1, cw_date_stata+180)

foreach x in inv6m {
	replace `x' = 0 if `x'==.
}

foreach x in inv6m {
	replace `x'=0 if screened[_n+1]==0
}

foreach x in 6 {
	label var inv`x'm "Subject of another investigation within `x' months"
}

*Keep only investigations from April 24 2017 on (6 months prior to October  2017)
keep if cw_date_stata >=20933

*Keep only screened-in calls 
keep if screened==1 

*Did this investigation end up in foster care?
merge 1:1 childpartyid intake_id using "${tmpdata}2017_19_removals.dta", keep(1 3)
drop if _merge==1
replace fc=0 if fc==.

*Keep only calls prior to June 30, 2019 (since we need to observe outcomes within the next 6 months)
keep if cw_date_stata<=21730

*Generate variables needed to match the earlier sample
gen cps_year = year(cw_date_stata)
rename (childpartyid intake_id) (vicid inv_caseid)

*Destring zipcode variable that will be used to adjust for strata fixed effects 
replace zipcode_vic = "." if zipcode_vic == "NA"
destring zipcode_vic, replace 

**Generate additional variables needed for the anaysis 
*Age
replace child_age ="" if child_age=="NA"
destring child_age, replace 
replace child_age=0 if child_age<0
replace child_age = round(child_age)
rename child_age age 
gen miss_age = age==.
replace age=18 if age==. 
rename age age_inv 

*Gender
gen female = child_sex=="f"
replace female=. if child_sex=="NA"

*Rename maltreatment variable
rename maltreatment maltreat

*Keep only relevant variables
keep sexab zipcode_vic cps_year white black inv6m worker_id vicid inv_caseid cw_date_stata fc ///
phyab phyneg impsup medneg failprot maltreat mom dad rel notrel female age_inv 

save "${tmpdata}sample_inv_2017_2019.dta", replace 
